A novel molecular signature identifies mixed subtypes in renal cell carcinoma with poor prognosis and independent response to immunotherapy

Background Renal cell carcinoma (RCC) is a heterogeneous disease comprising histologically defined subtypes. For therapy selection, precise subtype identification and individualized prognosis are mandatory, but currently limited. Our aim was to refine subtyping and outcome prediction across main subtypes, assuming that a tumor is composed of molecular features present in distinct pathological subtypes. Methods Individual RCC samples were modeled as linear combination of the main subtypes (clear cell (ccRCC), papillary (pRCC), chromophobe (chRCC)) using computational gene expression deconvolution. The new molecular subtyping was compared with histological classification of RCC using the Cancer Genome Atlas (TCGA) cohort (n = 864; ccRCC: 512; pRCC: 287; chRCC: 65) as well as 92 independent histopathologically well-characterized RCC. Predicted continuous subtypes were correlated to cancer-specific survival (CSS) in the TCGA cohort and validated in 242 independent RCC. Association with treatment-related progression-free survival (PFS) was studied in the JAVELIN Renal 101 (n = 726) and IMmotion151 trials (n = 823). CSS and PFS were analyzed using the Kaplan–Meier and Cox regression analysis. Results One hundred seventy-four signature genes enabled reference-free molecular classification of individual RCC. We unambiguously assign tumors to either ccRCC, pRCC, or chRCC and uncover molecularly heterogeneous tumors (e.g., with ccRCC and pRCC features), which are at risk of worse outcome. Assigned proportions of molecular subtype-features significantly correlated with CSS (ccRCC (P = 4.1E − 10), pRCC (P = 6.5E − 10), chRCC (P = 8.6E − 06)) in TCGA. Translation into a numerical RCC-R(isk) score enabled prognosis in TCGA (P = 9.5E − 11). Survival modeling based on the RCC-R score compared to pathological categories was significantly improved (P = 3.6E − 11). The RCC-R score was validated in univariate (P = 3.2E − 05; HR = 3.02, 95% CI: 1.8–5.08) and multivariate analyses including clinicopathological factors (P = 0.018; HR = 2.14, 95% CI: 1.14–4.04). Heterogeneous PD-L1-positive RCC determined by molecular subtyping showed increased PFS with checkpoint inhibition versus sunitinib in the JAVELIN Renal 101 (P = 3.3E − 04; HR = 0.52, 95% CI: 0.36 − 0.75) and IMmotion151 trials (P = 0.047; HR = 0.69, 95% CI: 0.48 − 1). The prediction of PFS significantly benefits from classification into heterogeneous and unambiguous subtypes in both cohorts (P = 0.013 and P = 0.032). Conclusion Switching from categorical to continuous subtype classification across most frequent RCC subtypes enables outcome prediction and fosters personalized treatment strategies. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-022-01105-y.

from R-package SCAN.UPC. The data accession number at the European Genomephenome Archive (www.ebi.ac.uk/ega/home), which is hosted by the EBI and the CRG, is EGAS00001001176.
Genome-wide transcriptome data produced by the GeneChip™ Human Genome U133 Plus 2.0 microarray (Thermo Fisher Scientific) for the 158 RCC patients [4][5][6][7][8][9][10] of C2 were downloaded from Gene Expression Omnibus (GEO) using R-package GEOquery (Additional File 4. Table S2). Microarrays from C2 were normalized individually using the SCAN method from the R-package SCAN.UPC. For all microarrays, probe sets were summarized at Entrez GeneID level using the annotation provided by brainarray for the respective chip type.
Ensembl gene identifiers used in TCGA gene expression data ("FPKM-UQ") were mapped to Entrez GeneIDs using the annotation provided by the org.Hs.eg.db Rpackage.
HUGO gene symbols used in gene expression data from the JAVELIN Renal 101 trial [11] were mapped to Entrez GeneIDs using the annotation provided by the org.Hs.eg.db R package.
Li et al. [12] compared whole-transcriptome profiles based on RNA-Seq using matched FFPE and fresh-frozen (FF) samples. Unmapped sequencing data were downloaded from the Sequence Read Archive (SRP050335) and were processed using the nfcore/rna-seq pipeline (version 1.4.2) [13]. We used the GENCODE gene annotation (version 22), which was also used for generation of the TCGA expression data. Entrez GeneIDs were obtained using the org.Hs.eg.db R package. PSA were calculated using FPKM values computed by StringTie [14].

Gene expression deconvolution using robust linear regression
Samples from RCC tissue were considered as composite samples that may combine specific molecular features from ccRCC, pRCC and chRCC. Further, it was assumed that the proportional composition of the subtypes is reflected in the gene expression profile of RCC samples. Gene expression deconvolution is a method that allows estimating the proportion of different cell types in heterogeneous samples [15,16].
The expression of a given gene in a RCC sample was modeled as weighted sum of the expression of this gene in ccRCC, pRCC and chRCC. The weights correspond to the proportional composition of and therefore are the same for all genes.
Mathematically, the objective of deconvolution is to find a solution to the system of linear equations: = × . Here, the vector of coefficients represents the unknown proportions to be determined for a potentially heterogeneous sample . Gene expression levels of the signature genes in are denoted by . is a signature matrix including the (median) expression values of the signature genes in ccRCC, pRCC, and chRCC. The matrix equation can be solved for using standard linear least squares regression [17]. Robust linear regression as implemented in the "rlm" function from the R-package MASS (parameter maxit was set to 200) was used to increase stability of assignments. Expression deconvolution was performed on linear, i.e. non-log2transformed, expression data as suggested by Zhong and Liu [18]. Log2 expression values from microarray analysis were therefore inverted to linear space. Raw counts from RNA-Seq measurement had to be normalized for sequencing depth and gene length. Linear expression values of the signature genes were centered to zero mean and scaled to unit variance preceding deconvolution. The proportional subtype assignment (PSA) was computed by setting negative coefficients to 0 and dividing the three coefficients by their sum, such that + + ℎ = 100%, with , , and ℎ representing the ccRCC, pRCC, and chRCC proportion, respectively. Subsequently, percentages were rounded to integers.
A permutation P-value ( ) was calculated to assess the specificity of the signature for a certain tumor sample. Basically, the P-value computation was carried out in the same way as described in Newman et al. [19]. Briefly, Pearson's correlation coefficient between and × was compared against a derived null distribution * for sample .
Expression values in were replaced by randomly drawn values from the full transcriptome of , denoted * . Subtype proportions * were determined for * by deconvolution and the correlation coefficient between * and * × was calculated.

Generation of the gene signature
The samples of C1, which were evaluated by two teams of pathologists, were considered as unambiguous cases of ccRCC, pRCC and chRCC, respectively (Additional file 2. Fig. S1). Initially, expression measurements of 32,749 genes were available for C1 based on the annotation provided by brainarray for Human Transcriptome Array 2.0. Expression data were sample-wise centered to the median expression level of C1. Genes with median expression below the cohort median in each of the three subtypes were removed. Further, genes not covered by the Human Genome U133 Plus 2.0 microarray according to the brainarray annotation or in the RNA expression data from TCGA were excluded.
It has been shown using TCGA data that RCC subtypes vary in tumor purity, see http://bioinformatics.mdanderson.org/estimate/. This pattern could be observed in C1 as well (Additional file 2. Fig. S2A). Therefore, genes stronger related to tumor purity as determined by the ESTIMATE method than to tumor type were removed to minimize the influence of tumor purity on the determination of PSA. To be precise, for each gene linear regression models were fit incorporating either tumor type or tumor purity as single predictor or both of them in a multiple regression model. In the latter case, models with and without interaction effects were considered. The reduction in the residual sum of squares was compared by analysis of deviance tests. Per variable, i.e.
purity or type, the model with single predictor was tested vs. both corresponding multiple regression models and the lower of the two resulting P-values was used. A gene was kept in case its expression in C1 could be better explained by tumor type than by tumor purity (Additional file 2. Fig. S2B). 11,195 genes remained after these filtering steps (Additional file 2. Fig. S2C) and were subsequently tested for subtype-specific expression by analysis of variance. 5,811 genes showed significant variation between subtypes after correction for multiple testing using Holm's method [20]. From this set, subtype-specific genes were determined using Tukey's range test. If expression differed significantly in two of the three pairwise comparisons between subtypes, a gene was considered subtype-specific, resulting in 1,379 ccRCC-, 844 pRCC, and 1,463 chRCCspecific genes (total 3,686). Expression values of the candidate genes in C1 were collapsed by taking the median per subtype. For each candidate, the minimum log2 fold change compared to the two respective other entities was calculated using the absolute values of the log2 fold changes (Additional file 2. Fig. S2D). Finally, genes were ordered by decreasing log2 fold change per subtype and expression values were transformed to linear space.
Next, the signature genes were selected from the set of subtype-specific genes. The expression of the signature genes in ccRCC, pRCC, and chRCC constituted the signature matrix . As in similar studies [17,19,21], various signature matrices were tested (Additional file 2.  proportion (mean proportion of tumors with MAD > 5%: 8.3%) of assignments relative to its predecessor matrix and therefore has been chosen as signature matrix (Additional file 2. Fig. S3B; Additional file 6. Table S4).

Variability in gene expression explained by subtype classifications
We assumed that differences between histological subtypes are comprehensively reflected by gene expression levels. Consequently, the information contained in PSA or pathological classification can be assessed based on the variance in expression data explained by these factors. To this end, a principal variance component analysis-like approach was applied [22]. First eigenvalues and eigenvectors of the covariance matrix of gene-wise mean-centered expression data were determined. Subsequently, linear regression was performed for each eigenvector with pathological categorization or PSA as explanatory variable. Subsequently, coefficients of determination (R-squared) from the resulting regression models were used to weight the eigenvalues correspondingly.
Dividing the sum of weighted eigenvalues by the total variance (i.e. the sum of the unweighted eigenvalues) resulted in the proportion of the explained variance for a certain variable.
The RCC population, here including ccRCC, pRCC, and chRCC from C3, was modeled using the prevalence of the subtypes reported in the recent edition of the WHO classification of tumors [23]. Based on the given ranges of frequencies (60-75% ccRCC,  (Fig. 3D). FPKM-UQ expression values were log2transformed (log2(x+1)).

Relationship between PSA and the occurrence of somatic mutations
The analysis of somatic mutations in C3 was limited to 344 genes [24] frequently mutated in RCC subtypes. Further, genes classified as FLAGS [25] were excluded, resulting in 310 genes with somatic mutation calls in 620 samples. Notably, the bi-allelic loss status of e.g. VHL, TP53 was not considered in this analysis. For each gene and subtype, the distribution of assigned proportions was compared between tumors with and without a somatic mutation using the Cramer-von Mises test as implemented in the R-package twosamples (Fig. S6). Permutation P-values were determined based on 10,000 bootstrap iterations.

Relationship between PSA and computational histopathology.
In the study of Fu et al. [26], tissue slides from 28 TCGA cohorts were cropped into tiles Using the histopathologic features, the Manhattan distance between tiles from the same slide as well as from different slides was calculated. We found that the mean distance between tiles of the same slide was of similar size as the respective mean distance between tiles of different slides from the same tumor (Additional file 2. Fig. S7). The mean pairwise Manhattan distance between the 50 tiles per tumor tissue slide was used as a measure of histopathological complexity.

Relationship between PSA and patient survival
Our approach introduced here estimates three percentage values per sample, representing the predicted proportions of ccRCC, pRCC, and chRCC. Hereinafter, the terms "proportion" and "score" are used interchangeably. The relationship between numerical subtype scores and cancer-specific survival (CSS) was investigated in cohort C3. Linear predictors from Cox proportional hazard models were used as prognostic index (PI). Consequently, the PI for a patient is the log relative hazard compared to a hypothetical individual with a PI of 0 and the log hazard ratio between two patients was derived from the difference in their PI.
First, univariate analyses using ccRCC, pRCC, as well as chRCC proportions as single predictors were performed, respectively (Additional file 2. Fig. S8). The proportions were modeled via restricted cubic splines (RCS) using the "rcs" function from the R-package rms. RCS are flexible functions that produce smooth fits between outcome and predictor and are particularly useful when the type of relationship is unknown. The predictor variable is subdivided into segments using a set of knots, and a cubic polynomial is fitted in each segment. Here, smooth joints at the knots and linearity in the two tails are required. Each time a variable was transformed with RCS, three, four, and five knots were pre-tested using the Akaike information criterion (AIC), as suggested by Harrell [27]. As only two of the subtype scores are independent (since all three sum up to 100%), the chRCC-score which exhibited the lowest variability (Additional file 2. Fig.   S8A) as well as the weakest association to survival (Fig. S8B) was excluded from the following analyses. Figure S8B (Additional file 2) shows that tumors with medium scale proportions of ccRCC and pRCC had the highest risk. The curves peaked at 39% (ccRCC) and 58% (pRCC), respectively, and with in-/decrease of the respective proportion the risk decreased for both subtype scores. The shape of both graphs suggested a cubic relationship to the log relative hazard. Hence, both scores were additionally modeled via cubic polynomials. The estimates from the RCS regression and the cubic polynomial regression were within the standard error of the respective other fit (Additional file 2. Fig. S10). Moreover, cubic polynomials and corresponding RCS yielded very similar predictions over the entire range with a maximum absolute difference in the PI of 0.0631 (ccRCC-score) and 0.0225 (pRCC-score), respectively.
In the survival analysis of cohort C3 (see Fig. 5A-C), ccRCC-and pRCC-score were used in (additive) combination in Cox regression analysis, with both being modeled using RCS. For 17 tumors of C3 for which survival data were not available the PI was predicted using the coefficients from the fitted Cox model.

Risk prediction of RCC using PSA and development of the RCC-R score
The predictive accuracy of ccRCC-and pRCC-score as well as their combination was compared by 10-fold cross validation (CV) in cohort C3. Modeling with RCS as well as with cubic polynomials was tested. As suggested by Dai and Breheny [28], the CV was applied to the linear predictors from the Cox models. To be more precise, for each fold a Cox model was fitted on the training set and then applied to estimate linear predictors on the test set. The linear predictors of all 10 test sets were pooled together to construct a set of linear predictors for the whole cohort. A partial likelihood was then obtained by regression of CSS on the combined set of linear predictors. Since the folds were selected at random, the CV was repeated 300 times to obtain stable estimates of the predictive accuracy for each tested model (Additional file 2. Fig. S11A). For comparison, the pathological classification into ccRCC, pRCC, and chRCC was evaluated as categorical predictor of survival. Cubic polynomial models were of equal or better quality compared to their RCS counterparts. Therefore, results from RCS fits are not discussed in the following. As expected, the pRCC-score achieved less predictive accuracy than the ccRCC-score. This is due to the fact that the pRCC-score assigns low or zero proportions to ccRCC as well as chRCC samples (Fig. 3B), which however differ strongly in their survival rates. In contrast, the ccRCC-score equates pRCC and chRCC tissues (Fig. 3A), which are more similar in terms of CSS. The (additive) combination of both scores turned out to be less suitable for risk prediction in unknown data because of its higher model complexity. Pathological classification performed worse than the PSAbased predictors. Hence, the ccRCC-score modeled via cubic polynomials, hereafter termed RCC-R score, showed the best trade-off between model complexity and predictive accuracy and consequently was selected as biomarker for risk prediction.
With PSA specified on 0 -1 scale, the PI for a RCC sample with a RCC-R score or ccRCC proportion of c was determined as follows:  Fig. SA10, Fig. 5D).

External validation of the RCC-R score
The validation of the RCC-R score as a risk score followed the proposals of Royston and Altman [29]. The validation cohort C5 comprised 242 RCC samples (Additional file 2. Fig. S12A). First, the PI in C5 (PIC5) was constructed using equation 1 for 241 tumors from C5 with < 0.05. Discrimination, i.e. the ability of the model to distinguish between patients with different risks, was evaluated by regression of CSS on PIC5. The resulting calibration slope was 1.106 (SE = 0.265) and thus not significantly different from 1 (P = 0.69 by z-test) indicating that discrimination was preserved in C5. The concordance between observed outcome and prediction, i.e. the calibration of the model, was assessed by comparing Kaplan-Meier estimates for C5 with predicted survival probabilities. First, risk groups were formed in C3. This was done by categorizing the PI using conditional inference trees (R-package partykit, function "ctree") with endpoint CSS. Three risk groups with significantly different survival functions were identified in this way (Fig. 5D). The corresponding cutoff values were then used to subdivide C5 based on PIC5 (Additional file 2. Fig. S12B). Next, CSS probabilities were predicted for patients from C5. Five patients from C5 with follow-up time exceeding the maximal follow-up time in C3 (16.2 years) were disregarded here.
Using PIC5 and the baseline survival function from the Cox model evaluated in cohort C3, estimates of survival probabilities along with pointwise 95% prediction intervals were calculated for each risk group in C5 at half-yearly intervals from 0 to 16 years (Additional file 2. Fig. S12C). Comparison of Kaplan-Meier estimates for the risk groups in C5 with the predicted CSS probabilities showed that the model slightly